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ABSTRACT 

A  new  technique  for  proving  lower  bounds  for  parallel  compu- 
tation is  introduced.  This  technique  enables  us  to  obtain,  for  the 
first  time,  non-trivial  tight  lower  bounds  for  shared-memory 
models  of  parallel  computation  that  allow  several  processors  to 
have  simultaneous  access  to  the  same  memory  location. 
Speciflcaily,  we  use  a  concurrent-read  concurrent-write  model  of 
parallel  computation.  It  has  p  processors,  each  has  access  to  a 
common  memory  of  size  m  (also  called  cotnmuxucatioa  width  or 
width  in  short).  The  input  to  the  problem  is  located  in  an  additional 
read-only  portion  of  the  common  memory. 

For  a  wide  variety  of  problems  (including  parity,  majority  and 
summation)  we  show  that  the  time  complexity  T  (depth)  and  the 
communication  width  m  are  related  by  the  trade-off  curve 
TJiT^  =  n(n).  (where  n  is  the  size  of  the  input),  regau^iless  of  the 
number  of  processors.  Moreover,  for  every  point  on  this  curve  with 
m  =  0(n/log^Ti)  we  give  a  matching  upper  bound  with  the 
optimal  number  of  processors. 

We  extend  our  technique  to  prove  mT^  =  Cl(n)  trade-ofT  for  a 
class  of  "simpler"  functions  (including  Boolean  Or)  on  a  weaker 
model  that  forbids  simultaneous  write  access.  We  also  state  aind 
give  a  proof  of  a  new  result  by  Beame  [D-83]  that  achieves  a  tight 
lower  bound  for  the  OR  in  this  model,  namely  mT^ ,=  Q{n).  These 
results  improve  the  lower  bound  of  Cook  and  Dwork  [CD-82]  when 
communication  is  limited. 
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1.  Introductioa 

Consider  the  following  informal  problem:  there  are  a  large  number  of  peo- 
ple (or  processing  units),  each  knows  n  numbers  cij.as.  ■  •  ,a^.  They  all  wish  to 
compute  the  sum  of  these  numbers.  If  they  cannot  communicate,  there  is  no 
way  to  avoid  sequential  (n(n)  time)  summation  by  each  person  separately.  On 
the  other  hand,  it  is  shown  in  the  paper  that  with  only  one  communication  chan- 
nel (one  cell  of  shared  memory)  this  time  can  be  reduced  to  0(Vn).  With  n 
(resp.  2")  shared  memory  cells  the  time  can  be  reduced  further  to  O(logn) 
(resp.  0(1)).  This  exemplifies  that  a  communication  facility  is  essential  for  any 
utilization  of  parallelism,  and  that  its  size  directly  afTects  the  performance  of 
the  algorithm. 

The  size  of  the  common  memory  required  by  a  given  parallel  algorithm  will 
be  determined  by  two  principal  factors. 

(a)  Input  availability. 

The  size  of  the  input,  in  the  case  that  the  input  is  placed  in  the  common 
memory,  or  the  need  to  transfer  input  data  in  the  case  that  the  input  is  ini- 
tially distributed  among  the  local  memories. 

(b)  Cooperation  between  processors. 

The  transmission  of  intermediate  results  between  processors,  utilized  to 
obtain  fast  processing  time. 


'  This  research  was  conducted  while  the  second  duthor  was  in  the  EECS  department  at 
Princeton  University. 
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Here  we  propose  to  concentrate  on  point  (b).  For  this  reason  we  put  the 
input  in  a  "read  only"  common  memory. 

In  this  paper  we  will  concentrate  on  parallel  RAMs  (PRAMs).  In  particular, 
the  Concurrent-Read  Concurrent-Write  PRAM  (CRCW  PRAM)  and  the  Concurrent- 
Read  Exclusive-Write  PRAM  (CREW  PRAM).  Both  models  are  precisely  defined  in 
section  2.  In  the  above  models  processors  communicate  via  a  shared  memory. 
Therefore  the  size  of  the  communication  facility  of  the  machine,  here  called 
cornrniLnication  xuidth  (or  width  in  short)  is  simply  the  number  of  shared 
memory  cells.  We  consider  the  width  m  a  resource,  together  with  the  size  p 
(the  number  of  processors)  and  the  depth  T  (the  running  time),  and  we  seek 
trade-offs  between  the  three. 

One  of  the  subtletites  in  proving  lower  bounds  for  these  models,  is  that  infor- 
mation m.ay  be  communicated  by  the  fact  that  no  processor  writes  into  a  com- 
mon memory  celL   We  introduce  a  novel  technique  to  deal  with  this  difliculty. 

For  a  large  class  of  functions,  which  includes  Parity  and  Majority,  we  prove 
T=Q(-^n/Tn)  on  the  CRCW  PRAM,  where  n  is  the  size  of  the  input.  This  lower 
bound  is  tight  for  ail  values  of  width  7n-0{n/log^).  This  is  the  first  time  non- 
trivial  tight  lower  bounds  are  achieved  for  a  model  that  allows  concurrent  write 
access.  The  only  known  lower  bound  on  the  CRCW  PRAM  model  is  given  in  Stock- 
meyer  and  Vishkin  [SV-82].  They  show,  using  a  result  of  Furst,  Saxe,  and  Sipser 
[FSS-81].  that  it  is  impossible  to  compute  parity  in  this  nnodel  in  constant  time 
using  a  polynomial  number  of  processors.  There  is.  however,  a  large  gap 
between  this  lower  bound  and  the  best  upper  bound  known  for  a  polynomial 
number  of  processors,  which  is  0(log  n/loglog  n).  (See  [CSV-82]). 

For  another  class  of  functions,  which  includes  the  fiinctions  AND  and  OR,  we 
prove  a  lower  bound  of  r=n((n/m)^^)  on  the  CREW  PRAM.  This  lower  bound 
extends  the  n(log  n)  of  Cook  and  Dwork  for  small  values  of  m,  and  further  dis- 
cerns the  power  of  CRCW  PRAM  from  the  CREW  PRAM.  At  this  point  we  state,  and 
give  a  proof,  of  a  new  result  by  Beame  that  achieves  a  tight  lower  bound  for  com- 
puting the  OR  in  this  model.  For  a  different  class  of  functions  (  that  include  OR) 
he  proves  7'=n(Vn/  m. ). 

Both  our  lower  bounds  hold  regardless  of  the  number  of  processors,  while 
the  upper  bounds  are  achieved  with  the  smallest  possible  number  of  processors. 

Cur  study  of  values  of  m  which  are  smaller  than  input  size  requires  us  to 
add  a  read  only  input  tape  to  the  model,   as  is  done  in  the  study  of  space 
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bounded  Turing  machines.  The  interest  in  those  values  is  not  solely  theoretical  - 
it  is  well  founded  in  practice.  For  example  the  "Ethernet"  can  be  considered  as  a 
PRAM  with  only  one  shared  memory  cell.  Also,  the  papers  Gottlieb  et  al. 
[GGKMRS-a2].  Kuck  [K-77]  and  Vishkin  [V-82]  imply  that  minimizing  the  size  of 
shared  memory  (that  can  be  accessed  in  parallel)  may  amount  to  hardware 
feasibility  of  the  parallel  machine. 

The  paper  is  organized  as  follows:  precise  definitions  and  the  lower  bounds 
are  given  in  section  2.  Section  3  contains  the  upper  bounds  and  section  4  con- 
cludes the  paper  and  suggests  further  research  directions.  To  improve  the  rea- 
dability of  section  2,  some  of  the  proofs  were  defered  to  the  appendix. 

2.  Lower  Bounds 

In  the  first  subsection  we  give  precise  definitions  of  the  models  of  computa- 
tion when  the  communication  width  7n  =  l,  and  of  the  types  of  functions  we  are 
interested  in.  Subsections  2  and  3  contain  the  lower  bound  proofs  for  the 
concurrent-write  and  exclusive-write  models  respectively  when  m  =  l.  In  the  last 
subsection  we  show  how  to  extend  the  lower  bounds  for  arbitrary  communica- 
tion width. 

2.1.  Definitions 

Definition  2.1:  A  CRCW  PRAM(l)  consists  of  a  set  IT  =  Ipi.pz,  •  •  J  of  processors,  a 
number  n  of  inputs,  n  read-only  input  cells  X{l),X{2),  ■  ■  ,X(ji),  one  common 
memory  cell  C,  an  alphabet  E  and  an  execution  time  T. 

Each  processor  Pi  has  a  set  of  states  Qi  amd  functions 

p<:9i-»i  1.2.  ■  •  •  .71  j  -  the  next  input  cell  to  be  read, 

(7<:  ft  -»E  -  the  symbol  to  be  written  into  C,  and 

5<:ftxZx2;-»ft  -  the  state  transition  function. 

At  each  time  period  f  =0,1,  •  ■  ■  .T  each  processor  p^  is  in  a  state  g^cft,  and 
the  cell  C  contains  a  symbol  s'eS.  At  time  t-0  the  input  cell  X(i)  contains  the 
input  Xi(€S)  (l^i^n).  the  cell  C  contains  a  designated  symbol  &o^2I,  and  every 
processor  p^  is  in  an  initial  state  g^^eft.  In  general 

qi*'=SiiqiX(j),s')  where  ;=p,(g»*).  and 


s'*J  =  < 


s*    if  for  every  i   at(gi'*"*)=s*  (no  one  writes) 
.  (Xt(gi  *').  "i  ia  the  smallest  s.t.  (7j(g/*')?«s* 
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The  value  /(xi.xg,  ■  ■  ■  .x^)  of  the  function  /  computed  by  the  PRAM(l)  is 
the  contents  s^  of  C  at  time  T. 

DeflniUon  2.2:  A  CREW  PRAM(l)  is  defined  exactly  like  the  CRCW  PRAM(l).  with 
only  one  exception  -  at  each  time  period  t  there  can  be  at  most  one  processor 
that  writes.  Le.  at  most  one  i  s.t.  Oi{<Ji*'^)*s^ ■ 

Remarks:  These  lower  bound  models  allow  FI,  E  and  Q^  to  be  infinite,  and  allow 
the  processors  be  non-uniform  (i.e.  have  diflerent  programs  for  different  values 
of  n).  Also  note  that  we  use  the  convention  that  a  processor  mrites  if  it  tries  to 
change  the  contents  of  C.  (and  in  the  CRCKy  it  must  be  the  one  with  the  smallest 
serial  number  doing  so). 

DefinitioQ  2.3:  Let  S  be  a  set,  IcS"  and  /  :  I  -♦  Z  some  function.  An  input 
x=Xi,X2,  ■  •  .x^el  is  said  to  be  k -s^nszt^ve  w.r.t  /  if  for  every  subset 
/CU.2.     •  ■  ,n\.  |/|=A-1    there    exists    another    input    y=Vi,V2,  .Vn^I    s.t. 

Xj=yj  for  all  j^J,  and  / (2)7^/ (y).  If  k  is  the  largest  integer  s.L  every  input 
(resp.  some  input)  x€l  is  A: -sensitive  w.r.L  /,  then  /  is  said  to  be  k— sensitive 
everywhere  {resp.  k —sensitive  soTnexLhere). 

Examples':  Consider  the  functions  Parity,  Majority,  OR  :  ^0.1}"  -»  {O.lj. 

Parity  is  n -sensitive  everywhere. 

Majority  is  ffi/2l-sensitive  everywhere. 

OR  is  only  1-sensitive  everywhere  for  all  n,  but  it  is  n-sensitive  somewhere  (the 

all  zeros  input  is  n-sensitive  w.r.L  OR). 

2.2.  Lower  bounds  for  the  CRCTrPRAM(l) 

"Hieorem.  2.1:  Let  A/  be  a  CRCW  PRAM(l)  that  computes  a  fc-sensitive  everywhere 
function/  in  time  T.  Then  T  =  n(vTE"). 

Corollary  2.1:  Let  jV  be  a  CRCW  PRAM(l)  that  computes  the  Parity,  Majority 
(Sum,  Max)  function  on  n  bits  (integers)  in  time  T.  Then  7  =  n(v9r). 

Proof:  Parity,  sum  and  max  are  n-sensitive  everywhere.  Majority  is  In/Si- 
sensitive  everywhere.    •. 

Let  us  informally  discuss  the  difficulties  we  are  facing  in  trying  to  prove 
theorem  2.1.  Consider  the  behaviour  of  the  machine  in  time  period  t.  There  are 
two  possible  cases: 

Case  1:  No  processor  writes  into  C  (s'=s'''). 
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Case  2:  Some  processor  (say  Pj)  writes  into  C  (s*  /s*~'). 

We  have  to  analyze  the  information  that  is  transferred  in  each  case.  Con- 
sider flrst  case  1.  As  Cook  and  Dwork  [CD-82]  point  out.  information  is 
transferred  in  this  case,  namely  the  information  that  nobody  wrote.  They  show 
how  this  information  can  be  used  in  ein  algorithm  for  the  OR  function,  that  is  fas- 
ter than  the  obvious  one.  The  way  they  keep  track  of  this  elusive  information  is 
heavily  based  on  the  fact  that  their  model  does  not  allow  simultaneous  write 
access  to  the  same  memory  cell.  (Indeed,  their  lower  bound  does  not  hold  for 
the  CRCW  PRAM).  As  our  model  allows  simultajieous  write  access,  we  had  to 
choose  an  approach  which  is  diflerent  from  theirs. 

The  information  that  is  transferred  in  case  2  seems  even  more  slippery.  We 
know  what  was  written  into  C,  and  in  addition  we  know  that  no  processor  with 
serial  number  smaller  than  j  tried  to  write.  (Note  that  as  2  may  be  infinite,  the 
writer  can  encode  its  serial  number  in  the  symbol  it  writes).  This  case  is  much 
simpler  in  the  exclusive  write  model,  since  there,  if  someone  writes,  there  can 
be  no  other  processor  that  tries  to  write! 

At  this- point  we  need  some  notation.  Let  I  denote  the  (non  empty)  set  of  all 
possible  inputs  (the  domain).  Fix  a  time  period  t  and  let  /J  =  s°s^  •  ■  •  s'~*  be  the 
string  of  successive  symbols  in  C  in  time  periods  0,1 t—\.  /5  is  called  the  his- 
tory through  time  t.  Denote  by  l^Cl  the  subset  of  inputs  that  have  history  /3 
through  time  t. 

Our  analysis  will  be  based  on  the  observation  that  cases  1  and  2  consist 
each  of  two  subcases.  Fix  ^,  a  history  through  time  t . 

Case  la;  There  is  no  input  in  I^  for  which  some  processor  writes  at  time  t. 

Case  lb:  There  is  an  input  in  I^  for  which  some  processor  writes  at  time  t . 

Case  2a:  There  is  no  input  in  I^  for  which  some  processor  with  smaller  serial 
number  than  j  writes  at  time  t . 

Case  2b:  There  is  an  input  in  I^  for  which  a  processor  with  smaller  serial  number 
than  j  writes  at  time  t. 

It  turns  out  that  cases  la  and  2a  are  simple  to  analyze.  Intuitively.  In  case 
la  no  new  information  is  transferred  as  /S  itself  contains  the  information  that  no 
one  will  write  at  time  t .  Similarly,  in  case  2a,  /3  contains  the  information  that  no 
processor  with  a  smaller  serial  number  than  the  writer  could  have  written,  so 
the  only  new  piece  of  information  is  the  new  symbol  in  C.  s' . 
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Now,  rather  than  confronting  the  elusive  information  that  is  transferred  in 
cases  lb  and  2b,  we  avoid  (or  circumvent)  it,  and  hence  coin  the  name  ctrcitm- 
ventixm  for  this  technique.  Showing  that  we  can  restrict  ourselves  to  the  "easy 
to  analyze"  cases  is  the  heart  of  our  argument. 

Let  I(Kii.Vi).  ■  •  ■  .(ii.yj)j)=i  2:el  |  x^=y;.  l^j^Z  j  denote  the  set  of  all 
inputs     (n-tuples)     whose     projection     on     the     Z -tuple       (ii.'i-z,  '  '     ."it)     is 

(vi.yz.  •  •   .yt)- 

Remark:  We  switch  here  from  qualifying  inputs  by  their  history  ("range" 
qualification)  to  qualifying  them  by  their  values  at  given  coordinates  (domain 
qualification).  This  yields  a  simpler  and  more  intuitive  proof  than  our  original 
one  which  used  range  qualification.  However,  we  believe  that  range  qualification 
is  more  powerful,  and  that  it  may  be  used  to  prove  lower  bounds  when  domain 
qualification  fails. 

The  following  iterative  definition  will  generate  an  "easy  to  analyze"  set  of 
inputs,  i.e.  inputs  for  which  cases  lb  and  2b  never  occur.  For  every  t,  Z?*  will 
contain  pairs  of  "fixed"  Input  positions  and  their  values,  and  ir*=I(Z?*). 

Let  £'°=I  and  D^=<p.  Consider  time  period  t  and  define  E*  ,D^  according  to 
the  following: 

Case  1:  There  is  no  processor  Pj  and  no  input  x  ££"*"'  such  thatpj  writes  on  x  at 
time  t.  Then 

E*  *-  £^"^  Z7*  *-  D^-K 

Case  2:  There  is  a  processor Pj  and  an  input  x^E*~^  s.t.  Pj  writes  on  x  at  time  t . 
Letpi  and  y  €£"*"'  be  so  thatpt  writes  on  y  at  time  t,  and  I  is  the  smallest  serial 
number  of  any  processor  that  writes  at  time  t  on  any  input  in  E*~^.  Let 
^liia.  ■  ■  ■  .^u  and.  ytj.yt,.  '  Vi^  be  the  sets  of  input  cells  and  their  contents 
(respectively)  that  were  read  by  pj  up  to  time  t.  (Clearly  U'St).  Then 

^-£^-'nia(ii.yi,).  •      .(xu.yOi). 

i?«-z?*-iuKi,.yi,).    ••.(i^.yOi- 

It  is  easy  to  see  that  for  every  0-^t^T 

(1)  \I)'\^\D*-^\+t,  li?°|=0  and  hence  \D*  \£t{t -i-i)/ 2. 

(2)  £^  =I(Z?* ),  Z?' -'ci?^  £^  C£^ -^ 

(3)  f « ?'0. 


In  particular  we  have: 

Lemmaai:  £•''^0  and  |Z3^|^r(r+l)/2 

Remark.  The  definition  above  generates  a  set  E'^  of  "easy  to  analyze"  inputs. 
regardless  of  the  function  being  computed.  Therefore  we  believe  that  this  tech- 
nique can  be  used  to  prove  lower  bounds  for  the  computation  of  other  functions 
in  this  model 

Lemraa  2.2:  Let  M  be  an  CRCIV  PRAM(l)  computing  a  function  /,  and  let  £'^  be 
defined  as  above  for  M.  Then  for  every  x.  y  zE''',  f  ix)=f  (y). 

A  rigorous  proof  of  this  lemma  is  given  in  the  appendix.  The  idea  is  to  show 
inductively  on  t,  that  any  processor  which  writes  at  time  t  on  some  input  in  £*  , 
will  have  exactly  the  same  computation  through  time  t  on  every  input  in  £'^. 

Proof  at  theorem.  2.1:  Recall  that  M  computes  a  A: -sensitive  everywhere  function 
/  in  time  T.  Suppose  that  r(r+l)/2  <  fc.  Then  \D'''\<Je,  and  so  by  definition 
2.3,  there  must  be  inputs  x  and  y  in  E^  s.t.  /  (i)?*/  (y).  This  contradicts  lemma 
2.2.  Therefore  r(r+l)/2i  fc.  so  r=n(VF).    • 

2.a   Lowei:  Bounds  for  the  CREW  PRAM(l) 

Consider  the  OR  function  of  n  bits.  As  mentioned  earlier,  the  OR  is  just  1- 
sensitive  everywhere,  so  the  results  in  the  previous  subsection  imply  only  a  con- 
stant time  lower  bound  for  it  on  the  CRCW  PRAM(l).  Indeed,  there  is  a  two-step 
algorithm  for  the  OR  on  this  model  as  follows.  In  the  first  step,  the  common 
memory  cell  C  is  initialized  with  '0*.  In  the  second  step,  a  processor  Pi  reads 
the  ith  input  position  and  writes  a  '1'  into  C  iff  the  value  it  read  was  '  1*. 

It  is  clear  why  this  algorithm  is  not  valid  for  a  CREW  PRAM.  Note,  however, 
that  if  the  domain  consists  only  of  inputs  which  have  at  most  one  position  con- 
taining a  '1*.  a  write  conflict  cannot  occur,  and  the  algorithm  is  valid  for  the 
CREW  PRAM.  For  this  reason  we  will  restrict  ourselves  here  to  functions  with  a 
full  domain  (i.e.   1=2").  The  main  result  in  this  subsection  is: 

Theorem  2.2:  Let  N  be  a  CREW  PRAM(l)  that  computes  a  k  -sensitive  somewhere 
function  J  in  time   T.  Then  T=Q(k^^^) 

Corollary  2.2:  If  g  is  the  OR  function  on  n  bits,  then  T=Q(ti^^^ 

In  a  earlier  version  of  this  paper  we  conjectured  that  the  lower  bound  of 
corollary  2.2  cam  be  improved  to  T~Q{^yn).  This  was  recently  proved  by  Beaime 
[B-83].  In  fact,  he  proved  the  following  stronger  theorem: 
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Theorem  2.3  (Beame):  Let  N  be  a  CRCW  PRAM(l)  that  computes  a  function 

g  :  iO.ir  -»  {O.lj  in  time  T.  If  there  exists  an  input  eel  s.t.  |{xel:  g{x)=g(e)l\  ' 

i  III /r.  then  r=n(VISiif). 

It  im.mediately  follows  that: 

CoroUarj  2.3  (Beame):  If  g  is  the  OR  function  on  n  bits,  then  T=Q{-^). 

The  proof  of  Theorem  2.3  has  the  same  structure  as  that  of  Theorem  2.2. 
However,  while  we  focus  on  the  sensitivity  of  Inputs  in  the  lower  bound  argu- 
ment, Beame  focuses  on  a  different  parameter,  namely  the  number  of  inputs 
with  the  same  image.  His  proof  is  of  independent  interest,  and  we  include  it  in 
the  appendix. 

We  return  to  the  proof  of  Theorem  2.2.  The  idea  is  to  use  the  framework  of 
the  previous  subsection,  namely  to  construct  a  set  of  inputs  E^,  and  show  that 
for  the  computed  function  to  be  constant  on  E^,  T  must  be  large.  This  task  was 
relatively  easy  for  everywhere  sensitive  functions,  since  we  did  not  have  to  worry 
about  the  contents  of  E^.  as  every  input  is  sensitive.  To  use  the  sensitivity  of 
inputs  in  a  somewhere  sensitive  function  in  a  similar  argument  we  must  make 
siore  that  E^  contains  at  least  one  sensitive  input.  This  motivates  the  following 
inductive  definition  of  the  sets  D^  ,E* . 

Let  g  :  I  -»  E  be  the  function  being  computed  and  e  ^e^.Sz,  ■  .e„€l  be  a  k- 
sensitive  input  w.r.t.  g.  Set  £*=I  and  Z?°=0.  Consider  time  period  t  and  define 
E*,D*  as  follows: 

Case  1:  There  is  no  processor  p^  and  no  input  xeiT*"'  such  thatpj  writes  on  x  at 
time  t.  Then 

£■*  -  E*~\  D^  -  D^~\ 

Case  2:  There  is  a  (unique)  processor  j>j   that  writes  on  ezE*~^  at  time  t.  Let 
ii.izi  ■  ■     .'iu    and  s^^.e^^,  •  ■     e^^   be   the   sets  of  input  cells   and  their  contents 
(respectively)  that  were  read  ty  pj  up  to  time  t.  (Clearly  u<0-  Then 
Z?'-i7*-iuKii,et,).--.(iu.eO}- 
i^-^-t-inKKit.e,,).  •■•,(iu.e^)j). 

Case  3:  There  exists  x €£"'"*,  x^e  s.t.  some  Pj  writes  on  x  at  time  t,  but  no  pro- 
cessor writes  on  e  at  time  t.  Let  Rq  be  a  set  of  positions  s.L  if  y€ir*~*  andy^=e^ 
for  all  izR%,  then  no  processor  writes  on  y  at  time  t.  In  this  case  we  fix  the  posi- 
tions ^0  "^ith  values  of  e : 
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£^-£^-»nI(Ki.ei)|i£i?S{) 

It  is  easy  to  see  inductively  that  e  €£"*  for  all  t,  and  so  ezE"^.  Our  main 
problem  is  to  obtidn  an  upper  bound  on  \Rq\. 

Leninia  2.3:  For  every  t ,  1 /?o  I  ^  i  (t  + 1)/ 2 

This  lemma  is  the  heart  of  the  lower  bound.  Since  the  proof  is  long,  it  is 
deferred  to  the  appendix. 

Lemma  2.4:  For  every  t,  |  27*  |  ii  (f  +  \){t  +2)/  6  and  e  e£^ . 

Proof:  By  simple  induction  on  i .  ■ 

Lemma  2.5:  For  every  x.yeiT^.  g{x)=g{y). 

Proof:  Exactly  the  same  as  the  proof  of  lemma  2.2.  • 

Proof  of  theorem  2.2:  Recall  that  N  computes  a  A: -sensitive  somewhere  function 
g  in  time  T.  Suppose  r(r+l)(r+2)/6  <A:.  Then  by  lemma  2.4  \D^\<Jc.  Since 
eZE^,  by  definition  2.3  there  must  be  a  y^E'''  s.t.  g{y)^g(e),  which  contradicts 
lemma  2.5.  • 

2.4.  Arbitrary  CommunicatioQ  Width 

What  happens  when  the  communication  width  is  larger  than  1?  The  CRCW 
PRAM(7n)  is  defined  similarly  to  the  CRCW  PRAM(l),  only  now  there  are  m.  com- 
mon memory  cells  C*(l),  C(2),  ■  •  ,C{m)  to  which  the  processors  have  con- 
current read/write  access.  In  a  similar  fashion  the  CREW  PRAM(Tn)  can  be 
defined.   Our  results  are  summarized  in  the  following  theorem. 

Theorem  2.4: 

Let  M  be  a  CRCW  PRAM(7n)  that  computes  a  fc -sensitive  everywhere  function 
/iKcS")-!  in  time  T.  Then  T=Qi^k/Tn). 


In  particular,  if /cJParity.  Majority.  Sum.  Maxj.  7'=n(Vn/r7x  ). 

Let  N  be  a  CREW  PRAM(Tn)  that  computes  a  fc -sensitive  somewhere  function 

^:Z''-E.  Then  T=Q{{k/ m)^^). 

In  particular,  if  j€{AND.  ORj.  r=fi((n/m)»^3) 

The  only  diflficulty  in  extending  our  technique  to  prove  theoreni  2.4  is  in  the 
definition  of  the  "easy  to  analyze"  cases.  For  example,  one  can  construct  a 
machine  for  which  the  following  happens:  There  are  inputs  for  which  both  C(l) 
and  C(2)  are  written  into.  However,  if  we  choose  an  input  for  which  the  smallest 
numbered  processor  writes  into  C(l),  no  one  will  write  into  C(2)  and  vice  versa. 
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We  overcome  this  difficulty  by  conceptually  serializing  the  write  access  into 
different  cells  as  follows:  Each  time  unit  t  is  sliced  into  m  slices,  so  that  in  the 
ith  slice  only  cell  C{i)  may  be  written  into.  Then,  at  the  ith  slice  of  time  period 
t  we  can  refer  not  only  to  the  contents  of  ail  cells  at  previous  time  periods,  but 
also  to  the  contents  of  cell  1  to  i-1  at  time  period  t.  (Note  that  the  machine  is 
not  affected  by  this  conceptual  slicing.  Indeed,  it  shows  that  our  results  hold 
even  in  a  stronger  model  that  allows  the  processors  to  access  all  common 
memory  cells  at  each  time  unit).  As  a  result  we  are  able  to  define  sets  ET^  and 
/?*♦,  O^t^T,  l^i^m  inductively  in  a  similar  fashion  to  the  previous  subsections 
for  the  CRCW  PRAM(m)  and  the  CREW  PRAM(m)  respectively.  The  only 
refinement  is  that  instead  of  defining  £"*  from  E*~\  we  define  E^  from  £^"'  when 
i>l.  and  r*' from  £"^* "''"*. 

The  analysis  of  the  previous  subsections  carries  through  in  a  straightfor- 

T  T 

ward  manner  w.r.t  the  final  sets.  E  "*  and  D  ".  This  includes  the  proof  of  the  fol- 
lowing two  lemmas  and  the  conclusion  of  the  theorem  from  them. 

Lemma  2.8: 

In  the  CRCWPRAM(rn.).  \D^'^\  :fi  m.(r+l)r/2. 
Inthe  CRr^PRAMCTn).  \D^'^\  ^  mr(r  +  l)(r+2)/ 6. 

Lemma  2.7: 

In  the  CRCWPRAM(m).  for  every  x.yC^'^'".  /  (x)=/ (v)- 
In  the  CREWPRAM(m).  for  every  x.yer^".  5(r)=5(v)- 
We  conclude  this  subsection  with  two  observations: 
1)   The   ideas    outlined   above   can   be   used   to   extend   also   Beame's    theorem 
(Theorem  2.3)  for  arbitrary  communication  width,  as  follows. 

Theorem  2.5:  Let  N  be  a  CRCiV  PRAM(77i)  that  computes  a  function 

g  :  [O,!]"*  -»  JO.lj  in  time  T.  If  there  exists  an  input  eel  s.t.  I(x€l  :  5(x)=^(e)J| 

S  |I|/r.  then  r=n(V(log2r)/m  ). 


In  particular,  if  g  is  the  OR  function,  then  T=Q{'^n/  m  ) 

2)  Two  other  concurrent -write  models  of  parallel  computation  that  appeared  in 
the  literature  ([SV-81].  [ShV-82]).  They  differ  from  our  CRCW  PRAM  in  the  way 
they  resolve  write  conflicts.  In  the  first  all  processors  that  access  the  same 
memory  location  should  write  the  same  value.  In  the  second  there  is  no  such 
restriction,  but  we  do  not  know  in  advance  which  processor  succeeds  in  writing. 
We  conclude  this  section  by  mentioning  that  those  two  models  are  weaker  than 
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ours,  and  therefore  our  results  for  the  CRCW  PRAM  hold  for  them  as  well. 

3.  Upper  Bounds 

All  upper  bounds  can  be  achieved  Ln  the  weaikest  version  of  a  PRAM,  namely 
the  Exclusive-Read  Exclusive-Write  PRAM  (EREW  PRAM).  It  is  similar  to  the  CREW 
PRAM,  only  that  in  this  model  any  simultaneous  access  of  a  shared  memory  cell 
is  forbidden-  The  algorithms  are  simple  and  will  be  described  informally.  They 
will  be  given  only  for  the  problem  of  summing  n  numbers.  It  is  easy  to  see  that 
they  hold  for  computing  any  associative  function. 

Consider  first  the  EREW  PRAM(l)  model.  The  n  numbers  tii.da.  ■  ■  ■  .On  are 
initially  stored  Ln  the  read-only  input  tape.  Let  Lj  be  a  local  memory  cell  of  pro- 
cessor Pj,  and  C  is  the  common  memory  cell.  The  algorithm  is  described  in  Fig- 
ure 1. 


Time 

Pi 

P2 

Pa 

P4 

1 

Z,i«-ai 

L2*-az 

i,3«-04 

Z,4«-a7 

2 

ia^-Is+Oj 

Z,4«-Z,4+aa 

•  ■  • 

3 

Z.3«-Z,3+ae 
C«-C+Z,3 

Z,4«-L4+a8 

4 

• 

Zr4«-/.4+tlio 

C«-C-i-Z4 

• 

Figure  1:  Summation  with  one  common  memory  cell 


Clearly,  only  p  =  0(^71 )  processors  are  active  in  this  algorithm,  and  the  sum 
ia  computed  in  O(v'n)  time.  Since  sequential  time  for  summation  is  Q(n),  a 
stredghtforward  lower  bound  of  Ci{n/p)  exists  for  any  parallel  machine  with  p 
processors.  Hence  the  number  of  processors  is  optimal  up  to  a  constant  factor. 
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Consider  now  the  same  problem  for  the  CRCW  PRAM(m).  where 
m.  =  0(n/log^n).  We  show  how  to  achieve  0(Vn/77i )  time  with  0(Vn7n  )  proces- 
sors. The  algorithm  has  two  phases: 

(1)  Partition  the  n  inputs  into  m.  subsets  of  size  roughly  n/m  each.  Assign  to 
each  subset  -Jtl/  m.  processors  and  one  common  memory  cell.  For  each 
subset  the  sum  is  computed  in  the  respective  memory  cell  using  the  algo- 
rithm above  in  time  0{-^Tt/  m, ). 

(2)  Sum  up  the  m.  values  in  the  common  memory  using  ttl  {^s/nm  )  processors 
in  O(log  77i)  time  in  the  obvious  way. 

As  before,  the  number  of  processors  used  is  optimal  up  to  a  constant  factor. 
This  upper  bound  establishes  that  our  lower  bound  for  Parity  on  the  CRCW 
PRAM(77i)  and  Beame's  lower  bound  for  the  OR  on  the  CFiEW  PRAM(m)  are  tight. 

We  conclude  by  mentioning  what  is  known  when  the  communication  width  is 
larger  than  the  input  size.  If  the  input  values  are  taken  from  a  finite  domain,  the 
sum.  can  be  computed  in  constant  time  using  exponential  width  and  number  of 
processors.  If  those  two  resources  are  bounded  by  a  polynomial  in  n,  the  best 
upper  bound  known  is  0{log  n/  loglog  n)  [CSV -82]. 

4.   CoQcluaioas  and  Open  Problems 

Using  communication  based  arguments  to  prove  lower  bounds  in  Computer 
Science  is  an  old  ideau  The  crossing-sequence  [HU-79]  technique  in  Turing 
machines  essentiailly  measures  communication  between  work-tape  cells.  This 
technique  was  extended  to  measure  communication  between  two  halves  of  a 
VLSI  circuit  [Y-81,  LS-81.  PS-a2]  and  obtain  Time-Area  trade-ofls. 

We  consider  this  paper  to  be  a  first  step  towards  understanding  the  central 
role  played  by  communication  in  efTicient  parallel  computation.  The  view  of  com- 
munication as  a  resource  in  parallel  machines  gives  rise  to  many  questions.  We 
mention  a  few  below. 

(1)  Our  lower  bound  for  the  Or  on  the  CREW  PRAM,  combined  with  that  of  Cook 
and  Dwork.  covers  the  whole  range  of  tti.  On  the  other  hand,  the  lower 
bound  for  the  parity  function  on  the  CRCW  PRAM{7n)  becomes  trivial  when 
m^n.  The  case  where  m,  is  only  bounded  by  a  polynomial  in  n  is  of  particu- 
lar interest,  since  a  lower  bound  on  the  time  here  will  give  a  lower  bound  on 
the  depth  of  polynomial  size  Parity  circuits. 
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(2)  Consider  parallel  RAMs  Ln  which  processors  are  allowed  to  be  probabilistic 
or  non-deterministic.  In  the  deterministic  version  of  the  CRCW  PRAM(l) 
which  we  studied  here,  both  the  Parity  and  the  Max  functions  have  an 
n(VTr )  lower  bound  on  the  time.  If  we  allow  non-determinism,  the  maximum 
of  n  numbers  can  be  computed  in  constant  time.  However,  we  conjecture 
that  the  lower  bound  stiU  holds  for  Parity  even  in  the  non-deterra.inistic 
model. 

(3)  Study  Time  -  Width  -  Processors  trade-offs  for  other  functions. 
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^pendix 

Lemma  2. 2:  For  every  X  .y  cr  ^ .  /  (x )  =/ (y ) . 

Proof  of  lemma  2.2:  We  use  the  following  notation.  For  an  input  x  el. 

qtix)  and  s*(x)  are  respectively  the  state  of  Pi  and  the  contents  of  C  in 

time  t  for  the  input  x. 

Rjix)  =  lP}iqJ{x))  \  O^T<t   I  is  the  set  of  input  cells  read  by  Pj   through 

timet.  Set/?S(x)=0. 

u)*(x)  is  the  index  of  the  processor  that  writes  at  time  t  on  input  x.    If 

there  is  no  such  processor  at  time  t  for  x,  ti;*(x)=0. 

W*{x)  -  \jRi{x)  where  ;=ui''(x).  is  the  set  of  input  cells  read  by  all  writers 

through  time  t . 

FW*(x)  =  \  w^{x)  I  t^r^T,  tu''(x)?*0  J  is  the  set  of  future  writers  from  time 

period  t  on. 

Let  X  and  y  be  elements  in  E'''.  It  is  sufficient  to  show  that  s^(x)=s^(y).  We 
prove  by  induction  on  t,  that  ti;'(x)  =  w'iy).  Jy«(x)  =  ff^iy),  s'(x)  =  s*{y).  and 
that  for  every  jzFW^ix)   qj{x)  -  qj{y)  and  Rj{x)  =  Rj{y). 

t=Q.  For  every  processor  ;.  g/(x)  =  g/(y)  =  q°.  /?/(x)  =  Rj\y)  =  <p.  Also, 
s°(x)  =  s°{y)  -  bo.  V°{')  =  "'"(V)  =  'P  and  w°{x)  =  ti;°(y )=0. 

f>0.  Assume  the  claim  holds  for  every  r<t.  Let  jeFW*{x)  Let 
i(*)  =Pj(?;"H^)).  id/)  =Pj(<?i"\y))-  By  the  induction  hypothesis  i{x)=iiy) 
and  hence  Rj{x)=Rj{y).  Since  j€/'J>'«(x),  /?;(x)cfy^(x)  which  using  x.ye^'' 
implies  that  Xi(,)=yi(,).  •  From  this  and  the  induction  hypothesis  we  get 
gi(*)  =?/(y)-  Let  <jj{xy=  Gjiqjix))  and  <T^{y)  =  a,{qj{y)).  Then  a;(x)=a;(y). 
There  are  two  cases  to  consider  now, 

case  1:      s'(x)=s'"'(x).       By      the      construction      of      E*       and      induction, 
s*(y)=s*(x)=s*-Kx).-«i^*(x)=Ti;*(y)=Oand  W'{x)=  fV*{y)=W'-\x). 
case  2:  s*(x)i<s*~^(x).  Let  j=tu*(x).  Again  by  construction  of  £7*.  there  can  be  no 
Kj    s.t.    a,(gj'(y))i*s*-»(y).    and   since   (7j(x)=aj(y)   we   have   tu'(y)=;  =ti;'(x). 
s*(y)=s*(x).  and  W*(y)=W*{x).  • 
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Lemma  2.3:  For  every  t,  |  /?^  |  ^f  (f  ■»•  1)/  2. 

Proof  of  lemma  2.3:  Denote  by  Z*  the  set  of  nonnegative  integers,  and  i,j.k,L 
denote   only  positive   integers.   Also,   for  a  subset  5c{1.2,     •  •  .nj   and  inputs 

z=T/(7nod  S)  means  x^=y^  for  all  ie5). 

-        1^(5)  =  {x€l  I  zsy(mod  5)J. 

Claim:  Given  an  integer  t.  a  set  S<Z\l,2.  .nj,  a  function  h.:'\^{S)-*Z*',  and  sets 
SjQ\l,2,  ■  ■•  ,n\  for  every  positive  jeA(I,  (5))  that  satisfy 

(1)  5^ n5=0  for  all;. 

(2)  IS^I^f  forailj. 

(3)  A(e)=0. 

(4)  h{x)-j  andysx{7nod  S,)  implies /i(y)=j. 

(5)  /i(x)=j,  h(y)=k  and  jVfc  implies  that  there  exists  asxizSj<^Sie  s.t.  Xi^y^. 

Then  there  exists  aset;?cU.2.  •     ■  .n|  s.t.  1 /?  |sf  (f +l)/2  and  A.(l,(5ui?)=0. 

Comiection  between  the  claim  and  the  lemma.  Recall  that  we  wanted  to  prove 
the  existence  of  i(f +  l)/2  input  positions  s.t  fbdng  them  with  values  of  e  will 
ensure  that  no  one  writes  at  time  t  in  case  3.  Let  Rj  be  defined  as  in  the  proof  of 
lemma  2.2.  Then  let  5  be  the  set  of  fixed  input  positions  through  time  t, 
(5=Z?*-M,(5)=£''-').  Sj=Rj-S  for  all  ;.  and  let  the  function  /i:I,(S)-»Z*  be 
defined  by  h{x)~j  if  pj  is  the  (unique)  processor  that  writes  on  x  at  time  t,  and 
/i(x)=0  if  no  one  writes  onx  at  time  t.  Let  us  verify  that  properties  (l)-(5)  hold. 

(1)  By  the  definition  of  Sj. 

(2)  Is' I  ^\RJ-S\^\Jij\rSt 

(3)  We  deed  here  only  with  case  3.  in  which  no  processor  writes  on  e . 

(4)  Since  x.yel, (5)  andx=y(r7iod  Sj),  x=y{Tnad  Rj).  With  an  almost  identical 
proof  to  that  of  lemma  2.2  we  can  prove  that  qjix)  =  qj{y).  i.e.  Pj  will  arrive 
at  the  same  state  at  time  t  for  both  inputs  x  and  y.  In  particular.  Pj  will 
write  on  X  if  and  only  if  it  will  write  on  y  at  time  t . 

(5)  Suppose  not.  Then  define  an  Input  z  by  z^=Zy,  if  ie5,.  z^  =y»  if  ieJ*— 5,.  and 
«<=ei  for  the  remaining  values  of  i.  Clearly  z€]^{S)  =  £"*"'.  Therefore  both 
Pj  and  pt  write  at  time  f  on  z.  contradicti'^-^  the  definition  of  the  CREW 
PRAM. 
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Now  we  can  take  Rq  -R,  which  completes  the  proof  of  the  lemma.  • 

Proof  of  Claim:  The  proof  is  by  induction  ont. 

t=0.  In  this  case  Ai(I,(5))  =  iOj.  Otherwise,  if  for  some  xel,(5)  there  exists  ;>0 

s.t  h{z)-i.  then  by  (4)  also  h{e)=j.  contradiction  to  (3). 

t>0.  If  h.(l,{S))~\0\  we  are  done.  Assume  that  for  some  zel,(5),  /i(x)=i>0.  Set 

5'=5u5i.    let   h'    be    the    restriction    of   h    to    1,(5').    and    Sj=Sj-Si    for    all 

j£h'(U{S')).   Then  we  have  the  following: 

(r)   5'n5;=0  for  all;.  Clear. 

(^)  \Sj\iSt-l  for  all  ;e/i'(I,(5')).  Since  i.;€l,(5)).  by  (5)  5,n5,;*0.  and  there- 
fore \S'j\  =  15^-5,1  ^  |5,-|-l^t-l. 

(3')   0  €1,  (5')  and  /i'(e  )=0.  Clear. 

(4')  hix)-j,y^x{mod  5j)  implies  h'(.y)-3-  Since  z.v€l,(5').  ysz  =  e(mod  5t) 
and  therefore  ysx{7Twd  5j).  Hence  ;  =  h'{x)  =  /i(x)  =  h{y)  =  /i'(i/)- 

(5')  hXx)=j,h-{y)=k,jjtk  implies  that  there  is  an  izS'jryS;,  s.t.  Zi^^V^.  Since 
h(x)=j .  h{y)-k  there  must  be  such  an  i  in  SjnSf  However,  since 
z^y^B{mad  Si)  i  must  belong  to  Sji^Si^. 

By  the  induction  hypothesis,   there   exists   a  set  R'   s.t.     \  R'\^{t-l)t/ 2  and 

/i'(I,(5'u/?'))=0.       Set       R=R-uSi.        Then       clearly       | /?|:Si(f +  l)/2       and 

/i(I.(Su^))  =  ;i(I,(5"u^'))  =  hi^^iS'uR-))  =  0.  . 

Theorem  2.3:    Let  N  be  a  CREW  PRAM(l)  that  computes  a  function  g  in  time  T 

such    that    3e€l    (    I=|0.1j")    for    which     \\x£l\gix)=g(e)l\  -s  \l\/r    Then 

Set  E°=l  and  F^  =  l\  'E°  =  <i>.  Consider  time  period  t .  For  any  ;  let  Pj  be  the 
set  of  input  positions  read  by  processor  Pj  up  to  time  t  and  define  £^  and  its 
complement  F*  as  follows: 

Case  1:  There  is  no  processor  py  and  no  input  xz£*~^  such  thatp^  writes  on  x  at 
time  f:Then£^  ^  E*~KF*  *- F*~^. 

Case  2:  No  processor  writes  on  s   at  time  t  but  there  is  an  x€E*~^  such  that 

some  Pj  writes  on  x  at  time  t : 

For  every  input  x€£^~'  which  causes  a  processor  pj  to  write  at  time  t  define 

ci^i(\ii.x,)\icpji). 
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Each  Cj  is  specified  by  the  values  in  at  most  t  input  cells  since  \PJ\^t.  It 
is  clear  that  any  x£E*~^  which  causes  a  write  at  time  t  is  in  some  C^.  Also  any 
y  €£^~'nC5  will  cause  a  write  at  time  t  since  processor  Pj  at  time  t  will  not  be 
able  to  distinguish  y  from  x.  Thus  if  we  eliminate  the  elements  of  these  "cubes" 
from  E*~^  no  writes  will  occur  at  time  t . 

The  "cubes"  also  satisfy  an  additional  property.  If  CjnCjJ  ^  0  then  their 
shared  specifying  positions  must  aigree  in  value.  Therefore,  if  Cj  ^  C^  then  Cj 
and  Cy  must  be  specified  by  different  input  cells  and  so  correspond  to  different 
processors.  It  follows  then  that  C5nCjcI\ir*"'  =  F*"'  otherwise  there  would  be 
a  simultaneous  write  which  is  not  allowed. 
Thus  if  we  designate  the  distinct  cubes  as  {Cfj  then 

1 

P*-F*~^^{uC^)  where  yiTtj,  C^nCjcF*'^ 

Case  3:  There  is  a  (unique)  processor  j)j  that  writes  on  e  €£^~'  at  time  t: 
Then  we  require  that  the  input  agree  with  e   in  the  positions  of  Pj.    We  may 
regard  this  as  requiring  that  the  input  be  In  the  cube  which  is  the  subset  of  the 
input  specified  by  these  \PJ\^t  values.    Equally  well  this  may  be  regarded  as 
excluding  from  the  input  all  values  which  are  in  the  cubes  specified  by  the  other 

IB    I  I 

2    '    — 1  possible  settings  of  vadues  in  these  positions.   If  we  call  these  excluded 
cubes  { Cjj  as  in  case  2.  it  is  immediate  that  "A^j.  C?  n  C^  =  0  c /^"'. 
Then  as  in  case  2  we  have 

£^«-£^"*\(uCf)  and 

i 

Lemma  2.8:   For  any  t^O  and  any  "cube"  C*  which  is  specified  by  at  most  s  cells 
of  the  input  3  an  integer  r  such  that  |  C  rsF*  \  -  j^J^^^ 

Proof:  By  induction  on  t 

t  -0:  F*  =if>  so  the  claim  is  true  with  r  =  0. 

Assume  claim  for  t  —1: 

f*  =  /^"*+2(Cf\^*~')  since  \4j«j,  C^r^CJCf*'^  (we  use  additive  notation  for  dis- 
i 

joint  union). 

Therefore   |Co/^|  =  |  Cn(/^-'+2(Cf\/^-')| 

i 
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=  \CnF*-^\  +i;i(C'nC?)\F*->| 
i 

=  |CnF*-»|  +2(|CnC?|  -  |(CnC?)n/^-'|) 
t 

If  we  only  sum  over  non-empty  intersections  then  COiCf  is  a  subset  of  the  input 
which  restricts  the  input  only  by  specifying  input  positions  and  which  is 
specified  by  at  most  s  +  t  of  them.  Thus  we  may  designate  CT**  =  CnCj.  There- 
fore 

\CnF*\  =  \CnF*-^\  +  2(1^?**!  -  \C^**ryF*-^\) 


r=       Pill  iV^^I^I  V—^lUU- 


t(t-i)    •*•  2j-:Tw L — .   ttt-i)     where p.gt.n  are  integers. 


This  follows  by  the  inductive  hypothesis  for  the  first  and  last  terms  and  because 
of  the  form  of  the  middle  terms. 

Since  edl  of  the  denominators  divide  2        *       the  claim  holds  for  t   emd  the 
Lemma  is  proved.  • 

Lemma2.9:.  eei:^  and  Vxe£"''.j(z)=y(e).  • 

Proof  of  Theorem  2.3:    If  we  apply  Lemma  2.8  with  s=0  then  C*  =1  and  we  see 

that  I /'''I   is  an  integral  multiple  of      tIt^j)  •    Since  E^ -1\  F'''  it  follows  that 

2     2 
l^^l  is  also  a  multiple  of  this  number.   Now  e  ^E'''  so  we  have  |  £"^1  >  0  ajid  thus 

l-^'"!  ^     T(T*i)  1^1-     ^y   °*^   assumption   on  g    and   by   Lemma   2.9   we   need 
2     2 

1  -^^^^ItlL 

I jE"^ I  ^  -Hi |.   Therefore  2     ^      s  r  and  so  7  =  nCVlogzr  ).    • 
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